AITopics | Alajuela

Collaborating Authors

Alajuela

MessIRve: A Large-Scale Spanish Information Retrieval Dataset

Valentini, Francisco, Cotik, Viviana, Furman, Damián, Bercovich, Ivan, Altszyler, Edgar, Pérez, Juan Manuel

arXiv.org Artificial IntelligenceSep-9-2024

Information retrieval (IR) is the task of finding relevant documents in response to a user query. Although Spanish is the second most spoken native language, current IR benchmarks lack Spanish data, hindering the development of information access tools for Spanish speakers. We introduce MessIRve, a large-scale Spanish IR dataset with around 730 thousand queries from Google's autocomplete API and relevant documents sourced from Wikipedia. MessIRve's queries reflect diverse Spanish-speaking regions, unlike other datasets that are translated from English or do not consider dialectal variations. The large size of the dataset allows it to cover a wide variety of topics, unlike smaller datasets. We provide a comprehensive description of the dataset, comparisons with existing datasets, and baseline evaluations of prominent IR models. Our contributions aim to advance Spanish IR research and improve information access for Spanish speakers.

dataset, messirve, query, (15 more...)

arXiv.org Artificial Intelligence

2409.05994

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
North America > Mexico (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
(34 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era

Ren, Zhao, Chang, Yi, Nguyen, Thanh Tam, Tan, Yang, Qian, Kun, Schuller, Björn W.

arXiv.org Artificial IntelligenceJan-23-2023

Heart sound auscultation has been demonstrated to be beneficial in clinical usage for early screening of cardiovascular diseases. Due to the high requirement of well-trained professionals for auscultation, automatic auscultation benefiting from signal processing and machine learning can help auxiliary diagnosis and reduce the burdens of training professional clinicians. Nevertheless, classic machine learning is limited to performance improvement in the era of big data. Deep learning has achieved better performance than classic machine learning in many research fields, as it employs more complex model architectures with stronger capability of extracting effective representations. Deep learning has been successfully applied to heart sound analysis in the past years. As most review works about heart sound analysis were given before 2017, the present survey is the first to work on a comprehensive overview to summarise papers on heart sound analysis with deep learning in the past six years 2017--2022. We introduce both classic machine learning and deep learning for comparison, and further offer insights about the advances and future research directions in deep learning for heart sound analysis.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2301.09362

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Portugal > Coimbra > Coimbra (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(44 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Education (1.00)
Health & Medicine > Health Care Providers & Services (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

Yuan, Ann, Ippolito, Daphne, Nikolaev, Vitaly, Callison-Burch, Chris, Coenen, Andy, Gehrmann, Sebastian

arXiv.org Artificial IntelligenceJan-12-2022

NLP researchers need more, higher-quality text datasets. Human-labeled datasets are expensive to collect, while datasets collected via automatic retrieval from the web such as WikiBio are noisy and can include undesired biases. Moreover, data sourced from the web is often included in datasets used to pretrain models, leading to inadvertent cross-contamination of training and test sets. In this work we introduce a novel method for efficient dataset curation: we use a large language model to provide seed generations to human raters, thereby changing dataset authoring from a writing task to an editing task. We use our method to curate SynthBio - a new evaluation set for WikiBio - composed of structured attribute lists describing fictional individuals, mapped to natural language biographies. We show that our dataset of fictional biographies is less noisy than WikiBio, and also more balanced with respect to gender and nationality.

biography, dataset, synthbio, (14 more...)

arXiv.org Artificial Intelligence

2111.06467

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(35 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback